BUG: Add warning if rows have more columns than expected #33782

mproszewska · 2020-04-25T02:08:16Z

closes Inconsistent behavior of read_csv when given an additional value on the first row of CSV file #33037
tests added and passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
warning when read rows have more columns than expected

mproszewska · 2020-04-25T10:25:18Z

Multiple tests are not passing due to the new warning.
Before fixing this, I want to ask if this solution is okay?

pandas/io/parsers.py

gfyoung · 2020-05-03T02:34:36Z

pandas/io/parsers.py

@@ -2508,6 +2512,13 @@ def read(self, rows=None):
            content = content[1:]

        alldata = self._rows_to_cols(content)
+        if len(columns) != len(alldata) and notna(alldata[len(columns) :]).any():


Is there a reason we need the notna check?

In example mentioned in linked issue additional comma was added in one row. I assumed that additional commas are common and hence we might ignore them and don't raise a warning.
I'm using notna to check if data that won't be included contains only NaN values.

gfyoung · 2020-05-03T02:35:35Z

@mproszewska : Sorry for the long wait here! Overall, the solution is on the right track.

pandas/tests/io/parser/test_common.py

pep8speaks · 2020-05-10T19:56:25Z

Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-08 19:27:23 UTC

mproszewska · 2020-05-10T21:07:06Z

I have no idea how to check where this ipython directive error comes from.

jbrockmendel · 2020-09-14T20:46:07Z

can you rebase and ill take a look at the ipython thing

mproszewska · 2020-10-08T20:31:29Z

can you rebase and ill take a look at the ipython thing

I rebased it

jbrockmendel · 2020-11-02T23:00:38Z

on the docbuild, it looks like the following is issuing a warning

data = "a,b,c\n4,apple,bat,\n8,orange,cow,"
pd.read_csv(StringIO(data), index_col=False)

under this PR, is issuing a warning here the correct thing to do? If so, then an :okwarning: needs to go in io.rst on L756

mproszewska · 2020-11-03T00:55:38Z

on the docbuild, it looks like the following is issuing a warning
data = "a,b,c\n4,apple,bat,\n8,orange,cow,"
pd.read_csv(StringIO(data), index_col=False)
under this PR, is issuing a warning here the correct thing to do? If so, then an :okwarning: needs to go in io.rst on L756

I think so. First row has 3 values and the rest - 4. Where in in.rst should :okwarning: be added? maybe there's another way to do that. It shouldn't be a common warning.

jreback · 2020-11-26T19:15:17Z

conceptually this is ok. pls merge master and will re-look (and yes we would have to either fix the warnings or assert_produces_warning, though prob should fix the incorrect usages).

jreback · 2021-01-01T21:52:45Z

hmm this looks like overlapping with #38587

jreback · 2021-01-01T21:54:02Z

closing in favor of #38587

mproszewska added 2 commits April 25, 2020 03:52

Add warnings when rows in csv file have too many values

b042377

Remove unused variable

89a04c5

gfyoung added API Design IO CSV read_csv, to_csv labels May 3, 2020

gfyoung reviewed May 3, 2020

View reviewed changes

pandas/io/parsers.py Show resolved Hide resolved

gfyoung reviewed May 3, 2020

View reviewed changes

gfyoung reviewed May 5, 2020

View reviewed changes

pandas/tests/io/parser/test_common.py Show resolved Hide resolved

mproszewska added 4 commits May 5, 2020 02:19

Add helper function

23c9109

Add comma in test

77537c2

Merge branch 'master' into csv

5c0dfb4

Include index_col and usecols in check

9bb7a86

mproszewska added 2 commits May 10, 2020 21:58

Run black

2d661e8

Add docstring

61d66ab

mproszewska added 13 commits May 15, 2020 17:38

PERF: Remove unnecessary copies in sorting functions

c94b45e

Run tests

0ab450b

Run tests

54c7304

Move function

e00993d

Add asv

6d72a34

Run black

5ba54a6

Remove asv

2766270

Merge branch 'perf'

91176ca

Run tests

412cd45

Merge remote-tracking branch 'upstream/master'

f748b78

Merge remote-tracking branch 'upstream/master'

c04c494

Merge branch 'master' into csv

f1807ee

Remove newline

4d7c568

mproszewska added 3 commits June 3, 2020 02:02

Fix

bbe77ca

Add asv

d9aa319

Fix

0afb1b1

mproszewska and others added 15 commits October 8, 2020 21:16

Add warnings when rows in csv file have too many values

35539d0

Remove unused variable

358113b

Add helper function

ab22429

Add comma in test

996213d

Include index_col and usecols in check

17d9b12

Run black

44a5da5

Add docstring

c191274

Move function

0567294

Run tests

31c9bd0

Remove newline

9a84498

Fix

459250b

Merge branch 'csv' of https://github.com/mproszewska/pandas into csv

cd0ad9e

Resolve conflicts

cd1239f

Merge branch 'master' into csv

6ad230a

Run black

18f3767

jreback closed this Jan 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Add warning if rows have more columns than expected #33782

BUG: Add warning if rows have more columns than expected #33782

mproszewska commented Apr 25, 2020

mproszewska commented Apr 25, 2020

gfyoung May 3, 2020

mproszewska May 5, 2020

gfyoung commented May 3, 2020 •

edited

Loading

pep8speaks commented May 10, 2020 •

edited

Loading

mproszewska commented May 10, 2020

jbrockmendel commented Sep 14, 2020

mproszewska commented Oct 8, 2020 •

edited

Loading

jbrockmendel commented Nov 2, 2020

mproszewska commented Nov 3, 2020

jreback commented Nov 26, 2020

jreback commented Jan 1, 2021

jreback commented Jan 1, 2021

BUG: Add warning if rows have more columns than expected #33782

BUG: Add warning if rows have more columns than expected #33782

Conversation

mproszewska commented Apr 25, 2020

mproszewska commented Apr 25, 2020

gfyoung May 3, 2020

Choose a reason for hiding this comment

mproszewska May 5, 2020

Choose a reason for hiding this comment

gfyoung commented May 3, 2020 • edited Loading

pep8speaks commented May 10, 2020 • edited Loading

Comment last updated at 2020-10-08 19:27:23 UTC

mproszewska commented May 10, 2020

jbrockmendel commented Sep 14, 2020

mproszewska commented Oct 8, 2020 • edited Loading

jbrockmendel commented Nov 2, 2020

mproszewska commented Nov 3, 2020

jreback commented Nov 26, 2020

jreback commented Jan 1, 2021

jreback commented Jan 1, 2021

gfyoung commented May 3, 2020 •

edited

Loading

pep8speaks commented May 10, 2020 •

edited

Loading

mproszewska commented Oct 8, 2020 •

edited

Loading